Kernel-Based Clustering of Big Data

نویسنده

  • Radha Chitta
چکیده

There has been a rapid increase in the volume of digital data over the recent years. Analysis of this data, popularly known as big data, necessitates highly scalable data analysis techniques. Clustering is an exploratory data analysis tool used to discover the underlying groups and structures in the data. Stateof-the-art scalable clustering algorithms assume “linear separability” of the clusters, and do not perform well on real-world data sets. Kernel-based clustering algorithms, which use non-linear similarity measures, are more accurate, but are not scalable to data sets containing billions of highdimensional points from thousands of clusters. We propose scalable approximate kernel-based clustering algorithms, based on random sampling and efficient optimization. The proposed algorithms are as efficient as linear clustering algorithms, and achieve cluster quality comparable to that of classical kernel-based clustering algorithms. We demonstrate the efficiency and effectiveness of the proposed algorithms on several diverse large-scale data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Kernel Spectral Clustering and applications

In this chapter we review the main literature related to kernel spectral clustering (KSC), an approach to clustering cast within a kernel-based optimization setting. KSC represents a least-squares support vector machine based formulation of spectral clustering described by a weighted kernel PCA objective. Just as in the classifier case, the binary clustering model is expressed by a hyperplane i...

متن کامل

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

The Application of Clustering Optimization in Data Mining Based on Multiple Kernel Function FCM Algorithm

Mono-nuclear kernel function is presented in this paper based on the fuzzy c-means clustering algorithm for data clustering to do the improvement in the field of data mining, puts forward the fuzzy c-means clustering algorithm based on multiple kernel function (MKFCM) algorithm. Under fully unsupervised learning method, a set of Gaussian kernel function combination are assigned different weight...

متن کامل

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...

متن کامل

Kernel Spectral Clustering for Big Data Networks

This paper shows the feasibility of utilizing the Kernel Spectral Clustering (KSC) method for the purpose of community detection in big data networks. KSC employs a primal-dual framework to construct a model. It results in a powerful property of effectively inferring the community affiliation for out-of-sample extensions. The original large kernel matrix cannot fitinto memory. Therefore, we sel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015